Note: There are often multiple ways to answer each question.
For the following questions, explain what went wrong and how the program can be fixed.
Load these packages before starting the problems:
library(ggplot2)
library(dplyr)
x
and y
together to get the value of 24:x <- 14
y <- "10"
x + y
## Error in x + y: non-numeric argument to binary operator
We cannot add a numeric variable and string together in this way. We first need to “coerce” y
into a numeric variable:
x <- 14
y <- "10"
x + as.numeric(y)
## [1] 24
1+2
and divide it by 3+4
:((1+2)/(3+4)))
## Error: <text>:1:14: unexpected ')'
## 1: ((1+2)/(3+4)))
## ^
There is one too many )
at the end of the line. We can fix this by removing it:
((1+2)/(3+4))
## [1] 0.4285714
For the rest of the questions, we will use the mtcars
dataset:
data(mtcars)
x
, where L is the number of columns in mtcars
.x <- 1:ncol(mtcars)-1
x
## [1] 0 1 2 3 4 5 6 7 8 9 10
The :
takes precedence over -
, meaning that it is evaluated first. Thus, the right hand side is equivalent to c(1, 2, ..., ncol(mtcars)) - 1
. Whenever we have a vector minus a single number, that number is subtracted from each element.
We can fix this by inserting parentheses:
x <- 1:(ncol(mtcars)-1)
x
## [1] 1 2 3 4 5 6 7 8 9 10
mpg
vs. wt
:ggplot(data = mtcars) +
geom_point(y = mpg, x = wt)
## Error in layer(data = data, mapping = mapping, stat = stat, geom = GeomPoint, : object 'wt' not found
We forgot wrap the stuff in geom_point
with aes(...)
. Because of that, R thinks we want x
to match a wt
variable, but it cannot find a variable named wt
in our environment. Fix:
ggplot(data = mtcars) +
geom_point(aes(y = mpg, x = wt))
mpg
:ggplot(data = mtcars)
+ geom_histogram(aes(x = mpg))
## Error: Cannot use `+.gg()` with a single argument. Did you accidentally put + on a new line?
The error message tells us what went wrong. We can fix it by moving the +
to the end of the previous line:
ggplot(data = mtcars) +
geom_histogram(aes(x = mpg))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
mpg
for each value of cyl
, overlay it with points, and add a title to the plot:ggplot(data = mtcars, aes(x = cyl, y = mpg)) +
geom_boxplot() +
geom_point()
labs(title = "Plot of mpg vs. cyl")
## Warning: Continuous x aesthetic -- did you forget aes(group=...)?
## $title
## [1] "Plot of mpg vs. cyl"
##
## attr(,"class")
## [1] "labels"
There are 2 problems here. First, we only get one boxplot instead of three (cyl
can take on value of 4, 6 or 8). This is because cyl
is a continuous variable, and as the warning suggests, we can add group = cyl
in aes(..)
. Second, there is no title for the plot because we forgot to add a +
after the third line of code. Fix:
ggplot(data = mtcars, aes(x = cyl, y = mpg, group = cyl)) +
geom_boxplot() +
geom_point() +
labs(title = "Plot of mpg vs. cyl")
qsec
vs. wt
, but we want all the points to be colored blue:ggplot(data = mtcars, aes(y = qsec, x = wt)) +
geom_point(aes(col = "blue"))
If we want the color of the points NOT to be data-dependent, then it should not go into the aes
call:
ggplot(data = mtcars, aes(y = qsec, x = wt)) +
geom_point(col = "blue")
miles per quart
and display the first 3 rows:mtcars %>% mutate(miles per quart = mpg / 4) %>% head(n = 3)
## Error: <text>:1:25: unexpected symbol
## 1: mtcars %>% mutate(miles per
## ^
If we want column names to have spaces, we need to surround the new column name with backticks:
mtcars %>% mutate(`miles per quart` = mpg / 4) %>% head(n = 3)
## mpg cyl disp hp drat wt qsec vs am gear carb miles per quart
## 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 5.25
## 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 5.25
## 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 5.70
mpg
for each value of gear
:mtcars %>% group_by(gear) %>% summarize(mean = mean)
## Error: Column `mean` is of unsupported type function
We forgot to say that we should be taking the mean of mpg
:
mtcars %>% group_by(gear) %>% summarize(mean = mean(mpg))
## # A tibble: 3 x 2
## gear mean
## <dbl> <dbl>
## 1 3 16.1
## 2 4 24.5
## 3 5 21.4
hp
and disp
in the dataset:mtcars %>% summarize(max = max(hp, disp))
## max
## 1 472
The code above looks for the single maximum across the 2 columns hp
and disp
and returns just that value. To compute the maximum in each column, we must have them in separate calls:
mtcars %>% summarize(max_hp = max(hp),
max_disp = max(disp))
## max_hp max_disp
## 1 335 472